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Abstract 

Some criticisms that have been raised against the Cox approach to 
probability theory are addressed. Should we use a single real number to 
measure a degree of rational belief? Can beliefs be compared? Are the 
Cox axioms obvious? Are there counterexamples to Cox? Rather than 
justifying Cox's choice of axioms we follow a different path and derive the 
sum and product rules of probability theory as the unique (up to regradu- 
ations) consistent representations of the Boolean AND and OR operations. 



1 Introduction 

The objective of the Cox approach to probability theory is to develop tools for 
reasoning under conditions of uncertainty [1] [2] [2] . The method proposed by Cox 
amounts to ranking statements according to the extent to which one is rationally 
justified in believing them. The ranking is implemented by associating to each 
statement a real number meant to represent a degree of rational belief. It is 
perhaps surprising that a bare minimum of rationality, namely, a requirement 
of consistency, is sufficient to yield a precise quantitative formalism. Cox's 
remarkable theorem states that ranking according to degrees of rational belief 
is equivalent to following the rules of probability theory. 

The importance of Cox's approach is, first, that it allows one to represent 
a partial state of knowledge as a consistent web of interconnected beliefs and, 
second, that it solves the long standing problem of interpretation: degrees of 
belief are to be manipulated according to the mathematical rules of probability 
theory and therefore no mistakes will ever be made if we call them "probabil- 
ities". These are not modest claims and it is only appropriate that the Cox 
approach be subjected to a severe critical scrutiny. The purpose of this paper 
is to address some of the criticisms that have been raised over the years. 

A thoughtful overview and general criticism of induction theories appears in 
the work of J. D. Norton |4j. He points out that in order to accept the Cox 
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argument one must be convinced that beliefs come in numerical degrees, that 
beliefs can be compared, and furthermore, that they must be transitive (if a is 
preferred over fo, and h over c, then a is preferred over c). Not obvious at all, 
he claims, a single number may not be sufficient to capture the richness of our 
beliefs which could very well be intransitive or even incommensurate. And the 
doubts do not end there. 

Cox assumes as one of his axioms that the degree of belief in a proposition 
a assuming that b is true, which we write as [a\b], is rigidly related to the 
degree corresponding to its negation, [a\b], through some definite but initially 
unspecified function /, 

m^nm) ■ w 

To some of us this is intuitively reasonable: the more one believes in o|6, the 
less one believes in a\h. Not obvious, writes Norton, "if one does not prejudge 
what 'belief must be, the assumption of this specific functional dependency is 
alien and arbitrary" . 

A second Cox axiom is that the degree of belief of "a and h assuming c," 
written as [a6|c], must depend on [a|c] and [fo|ac], 

[ab\c]^g{[a\c]Mac]) . (2) 

This is also very reasonable. When asked to check whether "a and 6" is true, 
we first look at a; if a turns out to be false the conjunction is false and we need 
not bother with 6; therefore [ab\c\ must depend on [a|c]. If a turns out to be true 
we need to take a further look at 6; therefore [ab\c\ must also depend on [b\ac\. 
Norton objects that it may be reasonable, but it is also an assumption "likely 
to be uncontroversial only for someone who already believes that plausibilities 
are probabilities and has tacitly in mind that we must eventually recover the 
product rule" [4]. 

Strictly [ab\c\ could in principle depend on all four quantities [a|c], [6|c], [a\bc\ 
and [6|ac], an objection that has a long history. It was partially addressed by 
Tribus [S] and then by Smith and Erickson [5] but, unfortunately, as has been 
convincingly pointed out by Garrett [7], their arguments are not completely 
satisfactory. 

Yet another objection has been raised by Halpern [8]. He shows that in 
finite domains it is possible to satisfy the consistency constraint that follows 
from the associativity of the Boolean and, {ab)c = a{bc), without requiring 
that the function g in eq.([2]) be itself associative. This allows him to construct 
counterexamples to Cox's theorem. 

In section 2 we discuss degrees of rational belief and why we are justified in 
representing them by real numbers. Then to (partially) counter the objection 
that the Cox axioms are intuitive only to those who are already convinced of the 
results we reformulate the Cox theory in terms of axioms that differ from the 
usual ones. The idea is to construct a representation of the Boolean and and 
OR by focusing on their associative and distributive properties rather than on 
the operation of negation. We then argue that it is the nature of our goal — to 
construct an inductive logic of general applicability — that allows us to escape 
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Halpern's criticism, and also to give a proper treatment to the Tribus-Smith- 
Erickson objection. In section 3 an associativity constraint is used to derive the 
sum rule rather than the product rule as Cox had originally done, and in section 
4 we focus on the distributive property of AND over OR to obtain the product 
rule. The negation function / in eq.([T|) and the functional equation associated 
with it are completely avoided. 

Our subject is degrees of rational belief but the algebraic approach followed 
here can be pursued in its own right irrespective of any interpretation. It was 
used in [S] to derive the manipulation rules for complex numbers interpreted 
as quantum mechanical amplitudes. It was also used by K. Knuth [lOj in the 
purely mathematical problem of assigning real numbers (valuations) on general 
distributive lattices. 

2 Degrees of rational belief 

Different individuals may hold different beliefs and it is certainly important to 
figure out what those beliefs might be — perhaps by observing their gambling 
behavior — but this is not our present concern. Our objective is neither to assess 
nor to describe the subjective beliefs of any particular individual. Instead we 
deal with the altogether different but very common problem that arises when 
we are confused and we want some guidance about what we are supposed to 
believe. Our concern here is not so much with beliefs as they actually are, but 
rather, with beliefs as they ought to be. 

Rational beliefs are constrained beliefs. Indeed, the essence of rationality 
lies precisely in the existence of some constraints. The problem, of course, is 
to figure out what those constraints might be. We need to identify normative 
criteria of rationality. It must be stressed that the beliefs discussed here are 
meant to be those held by an idealized rational individual who is not subject 
to practical human limitations. We are concerned with those ideal standards of 
rationality that we ought to strive to attain at least when discussing scientific 
matters. 

Here is our first criterion of rationality: whatever guidelines we pick they 
must be of general applicability — otherwise they fail when most needed, namely, 
when not much is known about a problem. Different rational individuals can 
reason about different topics, or about the same subject but on the basis of 
different information, and therefore they could hold different beliefs, but they 
must agree to follow the same rules. 

As a second criterion of (extremely idealized) rationality we require theories 
that allow quantitative reasoning. The obvious question concerns the type of 
quantity that will represent the intensity of beliefs. Discrete categorical variables 
are not adequate for a theory of general applicability; we need a much more 
refined scheme. 

Do we believe statement a more or less than statement hi Are we even 
justified in comparing statements a and bl The problem with statements is not 
that they cannot be compared but rather that the comparison can be carried out 
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in too many different ways. We can classify statements according to the degree 
we believe they are true, their plausibility; or according to the degree that we 
desire them to be true, their utility; or according to the degree that they happen 
to bear on a particular issue at hand, their relevance. We can even compare 
propositions with respect to the minimal number of bits that are required to 
state them, the description length. The detailed nature of our relations to 
statements is too complex to be captured by a single real number. What we 
claim is that a single real number is sufficient to measure one specific feature, the 
sheer intensity of rational belief. This should not be too controversial because 
it amounts to a tautology: an "intensity" is precisely the type of quantity that 
admits no more qualifications than that of being more intense or less intense; it 
is captured by a single real number. 

However, some preconception about our subject is unavoidable; we need 
some rough notion that a belief is not the same thing as a desire. But how 
can we know that we have captured pure belief and not belief contaminated 
with some hidden desire? Strictly we can't. We hope that our mathematical 
description captures a sufficiently purified notion of rational belief, and we can 
claim success only to the extent that the formalism proves to be useful. Since 
some preconceived notions are needed, here is one: we take it to be a defining 
feature of the intensity of rational beliefs that if a is more believable than b, 
and b more than c, then a is more believable than c. Such transitive rankings 
can be implemented using real numbers we are again led to claim that degrees 
of rational belief can be represented by real numbers. 

The notation we use is fairly standard: given any two statements a and b the 
disjunction "a OR &" and the conjunction "a AND &" are denoted respectively 
by a V 6 and ab. Typically we want to quantify our beliefs in a V & and in ab in 
the context of some background information c, which we write as ab\c. The real 
number that represents the degree of belief in a|6 will initially be denoted by 
[a\b]. Degrees of rational belief will range from the extreme of total certainty, 
[a I a] — vt, to total disbelief, [d\a] — vp. Note that the transitivity of the 
ranking scheme implies that there is a single value vp and a single vt- 

Here is a second preconceived notion: in order to be rational our beliefs in 
a V 6 and ab must be somehow related to our separate beliefs in a and b. Since 
the goal is to design a quantitative theory, we require that these relations be 
represented by some functions F and G, 

[aVb\c] = Fi[a\c],[b\c],[a\bc],[b\ac]) (3) 

and 

[ab\c] = G{[a\cl[b\c],[a\bcl[b\ac]) . (4) 

Note the qualitative nature of this assumption: what is being asserted is the 
existence of some unspecified functions F and G and not their specific functional 
forms. The same F and G are meant to apply to all propositions; what is being 
designed is a single inductive scheme of universal applicability. Note further 
that unlike eq. ^ the arguments of F and G include all four possible degrees of 
belief in a and b in the context of c and not any potentially questionable subset. 
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Since it is quite inconceivable that a smooth change in, say, [a|c] could lead 
to anything but a smooth change in [a V b\c] and [a6|c], we will assume that the 
functions F and G are sufBciently smooth and well behaved. Indeed, should it 
turn out that F and G require kinks or discontinuities we would probably feel 
justified in throwing the whole scheme away. (However, smoothness might not 
be necessary. See [TT].) 

Our method is one of eliminative induction: now that we have identified a 
sufficiently broad class of theories — quantitative theories of universal applica- 
bility, with degrees of belief represented by real numbers and the operations of 
conjunction and disjunction represented by functions — we can start weeding the 
unacceptable ones out. 

3 The sum rule 

We start with the function F that represents OR. The space of functions of four 
arguments is very large. To narrow down the field we initially restrict ourselves 
to propositions a and b that are mutually exclusive in the context of d. Thus, 

[aVb\d] = F{[a\d],[b\d],VF,VF) , (5) 

which effectively restricts F to a function of only two arguments. 
The associativity constraint: 

We require that the assignment of degrees of belief be consistent — if a degree 
of belief can be computed in two different ways the two ways must agree — how 
else could we claim to be rational? All functions F that fail to satisfy this 
constraint must be discarded. 

Consider any three mutually exclusive statements a, 5, and c in the context 
of a fourth d. The consistency constraint that follows from the associativity of 
the Boolean OR, (aV5)Vc = aV(6Vc), is remarkably constraining. It essentially 
determines the function F. Start from 

[aVbVc\d] = F{[aVb\d],[c\d])^F{[a\d],[b\/c\d]) , (6) 

and using F once again for [a V and for [6 V cjd], we get 

F{F{[a\d], [b\d]) , [c\d]} = F{[a\d],Fi[b\d], [c\d])} (7) 

If we call [a\d] = x, [b\d\ = y, and [c\d] — z, then 

F{Fix,y),z}^F{x,F{y,z)} (8) 

The function F must obey ([5]) for arbitrary choices of the propositions a, 5, c, 
and d. 

Halpern has raised the following objection [S]. Suppose we have a belief 
function [a\d] that associates a real number to each pair of propositions a and 
d. He observes that if the total number of such propositions is finite (a discrete 
universe of discourse) then the triples {x,y, z) to be used in ([8]) do not form a 
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dense set and therefore we are not allowed to conclude that the function F must 
itself be associative for arbitrary values of x, y, and z. Thus, in finite universes 
of discourse it is possible to design models of inference that are consistent with- 
out being equivalent to probability theory, and Halpern constructs an explicit 
example. 

The reply to Halpern's objection is not to be found in any flaw in his math- 
ematics. We must rather focus on the larger project at hand. We are concerned 
with designing a theory of inference of universal applicability, a single scheme 
to be used by all rational individuals irrespective of their state of knowledge or 
of subject matter. One individual might assign a plausibility [a\d] — x while 
another, who is in possession of different information, might assign [a|(i]' = x' , 
while a third would assign x" and so on. Thus the values x form a dense set, 
not because the allowed propositions a, 5,... are themselves dense, but rather 
because the belief functions [■!•], [•!■]',.■. are dense. Furthermore, the same gen- 
eral purpose scheme must be applicable to arbitrary subject matter, not just to 
one particular discrete set, but also to continuous sets of propositions. We con- 
clude that in order to be of universal applicability the function F must indeed 
be associative and satisfy ^ for arbitrary values of {x,y, z). 
The general solution and its regraduation: 

By straightforward substitution one can check that eq. ([5]) is satisfied if 



where 4> is an arbitrary invertible function. It has been shown that this is also 
the general solution [1] [11] . Given (/) one can calculate F and, conversely, given 
F one can calculate the corresponding Eq.® can be rewritten as 



d^{F{x,y))^ct){x)+(^{y) or <P {[a y b\d]) ^ [[a\d]) + {[b\d]) . (10) 



This last form is the pivotal point of the whole argument: it shows that instead 
of representing degrees of belief along the scale provided by the numbers [a\d], 
we can equally well regraduate to a new scale given by ^ (a\d) = ([a|d]). The 
original and the regraduated scales are equivalent because being invertible the 
function is monotonic and preserves the ranking of propositions. However, 
the regraduated scale is much more convenient because the OR operation is now 
represented by a much simpler sum rule, 



The regraduated £,p = (j^ivp) is easy to evaluate. Setting d — a in eq. lfTTj) 
gives ^ (a V b\a) = ^ + C (^1^)- Since a V b\a is true if and only if b\a is 
true, the corresponding degrees of belief must coincide, ^ (a V b\a) — ^ (6|a), and 
therefore £,{a\a) — = Q. 
The general sum rule: 

The restriction to mutually exclusive propositions in the sum rule cq. (|lip 
can easily be lifted. Any proposition a can be written as the disjunction of two 



F{x,y) = <t>-\c^{x) + c^[y)) , 



(9) 



C(aVfe|d) =^(a|d)+^(6|d) . 



(11) 
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mutually exclusive ones, a — {ah) V {ah) and similarly h = {ab) V {ah). Therefore 
for any two arbitrary propositions a and h we have 

a V = {ab) V {ab) V {ab) (12) 

Since each of the terms on the right are mutually exclusive the sum rule (jlip 
applies, 

S_{ayb\d) = S,{ab\d)+S,{ab\d)+S,{hb\d) + [£,{ab\d)-£,{ab\d)] 

= ^{ab W ab\d) + ^{ab V db\d) - ^{ab\d) , (13) 

which leads to the general sum rule, 

^{a V b\d) = ^{a\d) + ^{b\d) - ^{ab\d) . (14) 



4 The product rule 

Next we consider the function G that represents and. The space of functions 
of four arguments is very large so we first narrow it down to just two. Then, 
we impose a consistency constraint that follows from the distributive properties 
of the Boolean and and OR. A trivial regraduation yields the product rule of 
probability theory. 

From four arguments down to two: 

We will separately consider special cases where the function G depends on 
only two arguments, then three, and finally all four arguments. Using commu- 
tivity, ab = ba, the possibilities are seven: 



aab\c) = G(i)[^(a|c),^(6|c)] (15) 

aab\c) - G(2)K(a|c),C(a|6c)] (16) 

aab\c) = G(3)[e(a|c),^(6|ac)] (17) 

aab\c) - G(4)K(a|6c),^(5|ac)] (18) 

aab\c) = G('^)K(a|c),^(6|c),C(a|6c)] (19) 

aab\c) = G(6)[^(a|c),^(a|5c),e(6|ac)] (20) 

C(a6|c) = G(^)K(a|c),^(6|c),e(a|5c),e(6|ac)] (21) 

Since the method aims at general applicability the arguments of G^^-' . . . G'-^^ 
can be varied independently. 



First some notation: complete certainty is denoted ^y, while complete dis- 
belief is ^ 0. Derivatives are denoted with a subscript: the derivative of 
G^^^x,y) with respect to its second argument y is G^\x,y). 
Type 1: ^{ab\c) = G^^^ [$(a|c), C(6|c)]. The function G^^^ is unsatisfactory 
because it does not take possible correlations between a and b into account. For 
example, when a and b are mutually exclusive — for example b — ad, for some 
arbitrary d — ^{ab\c) = but there are no constraints on either ^(a|c) = x or 
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^(6|c) — y. Thus, in order that G*^^-* (a;, = for arbitrary choices of x and y, 
G*^^^ must be a constant which is unacceptable. 

Type 2: ^{ab\c) = G^'^^[^{a\c),^{a\bc)]. This function is unsatisfactory because 
it overlooks the plausibility of b\c [^. For example: let a — "X is big" and b = 
"X is big and green" so that ab = b. Then 

C(6|c) = G(^)K(a|c),C(a|a6c)] or - G^^) [e(a|c), , (22) 

which is clearly unsatisfactory since "green" does not figure anywhere on the 
right hand side. 

Type 3: ^(a6|c) = G'^-* [^(a|c), ^(6|ac)]. This function turns out to be satisfac- 
tory 

Type 4: ^{ab\c) ~ G^"^^ [^{a\bc) , £,{b\ac)]. This function strongly violates com- 
mon sense: when a — b we have ^{a\c) = G'-'*-' (^y, ^-j-), so that £,{a\c) takes the 
same constant value irrespective of what a might be [5] . 

Type 5: ^(a6|c) = G^^'>[^{a\c),^{b\c),^{a\bc)]. This function turns out to be 
equivalent either to G'^^-' or to G^^-* and can therefore be ignored. The proof 
follows from associativity, (ab)c\d = a{bc)\d, which leads to the constraint 

G-O) Iq(.5) [^(^a\d), ^{b\d), ^{a\bd)],£,{c\d), G*'"^) [^{a\cd) , £,{b\cd) , £,{a\bcd)] 

= G(5) [C{a\d),G^^'> [^{b\d), ^{c\d), ^{b\cd)], £,{a\bcd)] 

and, with the appropriate identifications, 

G^''^[G^'\x,y,z),u,G^^\v,w,s)]^G^''^[x,G^''\y,u,w),s] . (23) 

Since the variables x,y . . . s can be varied independently of each other we can 
take a partial derivative with respect to z, 

g['^ [G(5) (x, y, z), u, G(5) {v, w, s)]Gf {x, y,z)=0. (24) 
Therefore, either 

Gi^\x,y,z)^Q or G[^'>[G^^\x,y,z),u,G'^^\v,w,s)] = . (25) 

The first possibility says that G^^^ is independent of its third argument which 
means that it is of the type G*^^^ that has already been ruled out. The second 
possibility says that G^^-* is independent of its first argument which means that 
it is already included among the type G^^). 

Type 6: ^{ab\c) = G'^^^[^{a\c),^{a\bc),S,{b\ac)]. This function turns out to be 
equivalent either to G*^^-* or to G^''^ and can therefore be ignored. The proof — 
which we omit because it is identical to the proof above for type 5 — also follows 
from associativity, {ab)c\d = a{bc)\d. 

Type 7: ^{ab\c) = G^'^^[C{a\c),£,{b\c),^{a\bc),^{b\ac)]. This function turns out 
to be equivalent either to G^^-* or G*^®^ and can therefore be ignored. Again 
the proof which uses associativity, {ab)c\d = a{bc)\d, is omitted because it is 
identical to type 5. 
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We conclude that the possible functions G that are viable candidates for a 
general theory of inductive inference are equivalent to type G^^^ , 

aah\c)^G[^{a\c),mac)] . (26) 

The distributivity constraint: 

Consider three statements a, 6, and c, where the last two are mutually exclu- 
sive, in the context of a fourth, d. Distributivity, a (6 V c) = ab\/ ac, implies that 
^ (a (6 V c) \d) can be computed in two ways, ^ (a (5 V c) |d) = ^ {{ab\d) V {ac\d)). 
Using eq-dll]) and ^ leads to 

G {a\d) , e {b\ad) + e (c|ad)) = G[C (a|d) , C (6|ad)] + G[e (a|d) , ^ {c\ad)] . (27) 

Therefore, the requirement of distributivity constrains G to satisfy 

G{u,v + w) (u, w) + G (w, w) , (28) 

where ^ (a|d) = u, ^ (6|ac?) = v, and ^ (cjad) = w. To solve this constraint let 
v + w = z. Differentiating with respect to v and w gives G {u, z) / dz"^ = 0, so 
that G is linear in its second argument, G{u,v) = A{u)v + B{u). Substituting 
back into eq.([2Sl) gives B{u) = 0. 

To determine the function A{u) |T2] we note that the degree to which we 
believe in ad\d is exactly the degree to which we believe in a\d by itself. There- 
fore, 

^{a\d)=^{ad\d) = G[aa\d),ad\ad)]^G[aa\d),^T] or u = A{u)S,r , (29) 
which means that 

G(M,v)=uw/eT or C(a6|d) = e(aM)C(fe|ac^)/CT- (30) 

The constant is easily regraduated away: just normalize ^ to p = ^/Ct- 
the regraduated scale the and operation is represented by a simple product 
rule, 

p{ab\d) ^p{a\d) p{b\ad) , (31) 

while the sum rule, eq. (|14p . remains unaffected, 

p (a V b\d) = p {a\d) + p {b\d) - p{ab\d). (32) 

Degrees of belief p measured in this particularly convenient regraduated scale 
can be called "probabilities" . The degrees of belief ^ range from total disbelief 
= to total certainty ^j^. The corresponding regraduated values are pp = 
and pt ~ 1- 
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5 Conclusion 



Probability theory is the unique method of rational, quantitative and consistent 
inductive inference that can claim to be of general applicability. It focuses on 
degrees of rational belief and not on other qualities such as simplicity, explana- 
tory power, degree of confirmation, desirability, or amount of information. The 
reason the method is unique is not because we have succeeded in formulating a 
precise and rigorous definition of rationality. Rather, the method is unique for 
the more modest reason that it is the only one left after obvious irrationalities 
— such as inconsistencies — have been weeded out. 

Acknovirledgements: I am grateful to K. Knuth, and C. Rodriguez for many 
discussions on these topics, and especially to Nestor Caticha who drew my 
attention to an error and indicated the appropriate correction. I also thank T. 
Loredo who, after completion of this paper, drew my attention to reference |12j 
which also invokes the requirement of universality to evade Halpern's objection. 

References 

[1] R. T. Cox, Am. J. Phys. 14, 1-13 (1946). 

[2] E. T. Jayncs, Probability Theory: The Logic of Science, ed. by L. Brctthorst 
(Cambridge U.P., 2003). 

[3] A. Caticha, Lectures on Probability, Entropy, and Statistical Physics 
(MaxEnOS, Sao Paulo, 2008) (arXiv.org/abs/0808.0012). 

[4] J. D. Norton, Brit. J. Phil. Sci. 58, 141 (2007). 

[5] M. Tribus, Rational Descriptions, Decisions and Designs (Pergamon, 
1969). 

[6] C. R. Smith, G. J. Erickson, p. 17 in Maximum Entropy and Bayesian Meth- 
ods ed. by P. F. Fougere (Kluwer, 1990). 

[7] A. Garrett, p. 175 in Maximum Entropy and Bayesian Methods ed. by G. 
R. Heidbreder (Kluwer, 1996). 

[8] J. Y. Halpern, Journal of Artificial Intelligence Research 10, 67 (1999). 

[9] A. Caticha, Phys. Rev. A57, 1572 (1998) (arXiv.org/abs/quant- 
ph/9804012). 

[10] K. H. Knuth, p. 204 in Bayesian Lnference and Maximum Entropy Methods 
in Science and Engineering, ed. by G.J. Erickson and Y. Zhai, AIP Conf. 
Proc. 707 (2003). 

[11] J. Aczel, Lectures on Functional Equations and Their Applications (Aca- 
demic Press, New York, 1996). 



10 



[12] K. Van Horn, Int. J. Approx. Reasoning 34, 3 (2003). 

[13] This argument is due to N. Caticha, private communication (2009). 



11 



